Skip to content

Conversation

ldematte
Copy link
Contributor

@ldematte ldematte commented Oct 10, 2025

This PR changes how we gather and compact vector data for transmitting them to the GPU. Instead of using a temporary file to write out the compacted arrays, we use directly the vector values from the scorer supplier, which are backed by a memory mapped input. This way we avoid an additional copy of the data.

@ldematte ldematte requested a review from a team as a code owner October 10, 2025 15:14
@ldematte ldematte added >non-issue auto-backport Automatically create backport pull requests when merged :Search Relevance/Vectors Vector search test-gpu Run tests using a GPU v9.2.1 v9.3.0 labels Oct 10, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Oct 10, 2025
@ldematte ldematte changed the title [Gpu] Optimize merge memory usage [GPU] Optimize merge memory usage Oct 10, 2025
Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ldematte Great work, I have not tested it yet, but amazing work how you organized it. My main comment: do you think we can simplify this PR by breaking into two separate ones: making this PR only about changes to merges, and doing changes for flush, ResourcesHolder, 128Mb in a separate PR? Or these changes are tightly coupled?

@ldematte
Copy link
Contributor Author

doing changes for flush, ResourcesHolder, 128Mb in a separate PR?

I can do that: here is the PR #136464

@mayya-sharipova
Copy link
Contributor

@ldematte Great changes. I have done some benchmarking on my laptop with int8, and I see great recall but surprisingly no speedups as compared with main branch:

gist: 1_000_000 docs; 960 dims; euclidean metric

index_type index_time (ms) force_merge_time (ms) QPS single segment recall
gpu main 61422 69010 353 0.97
gpu PR 59035 67766 296 0.98

cohere-wikipedia_v2: 934_024 docs; 768 dims; cosine metric

index_type index_time (ms) force_merge_time (ms) QPS single segment recall
gpu main 48164 47657 384 0.99
gpu PR 47824 47354 393 0.99

Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, @ldematte

@ldematte
Copy link
Contributor Author

@mayya-sharipova I also expected speed-ups on force merge; it seems to be a bit better, but it's some "%", not "x".
I think this could be better in a "real" scenario (maybe even rally), where the disk is contended (search ops, translog, etc. -- we do have exclusive use of the drive here).
I simulated this by adding a background copy operation to keep the disk somehow busy, and you see it's more relevant. Still "%", not "x", but at least you can tell it's there and it's not noise.

@ldematte
Copy link
Contributor Author

@mayya-sharipova I updated merge as agreed, to avoid using directly device memory due to the cuVS bug.
I'll wait for your re-review; you can just look at the latest commit. Thanks!

Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ldematte Thanks, the latest changes to copy to a separate memory segment LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >non-issue :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch test-gpu Run tests using a GPU v9.2.1 v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants